Narrative Nexus: LLM-Enhanced Gaming Worlds
Discovering Immersive Gamer-NPC Experiences
Boyu Cao, Pranav Manjunath, Lingxiang Zhang
Motivation
Nowadays, the gaming industry has been rapidly evolving due to more advanced hardware and software development capabilities. Yet, some NPC-driven and conversation-driven games, such as RPGs (role-playing games), have remained under the same mechanisms for decades. With weaker abilities to draw gamers' attention and the lack of technological advancement, RPG games can be trapped in a vicious cycle, losing income and finding it hard to compete with other gaming categories. However, with the help of AI and LLMs, our team believes that NPC/conversation-driven gaming mechanisms can be drastically innovated, bringing a more immersive gaming experience to users.
In this project, we are exploring approaches for players to interact with NPCs in video games by connecting Google's Gemini Pro to an NPC and facilitating active and stimulating conversations between gamers and NPCs. Unlike the traditional pre-set NPC script that is fed to the gamer during the conversation, we are training each NPC with a specific background and so much more, enabling it to actively engage with the gamer during the game. The task involves creating a platform (Narrative Nexus) that enables game designers to provide NPCs with high degrees of personalized and dynamic dialogue interactions. By leveraging LLMs, NPCs can communicate with players based on their backgrounds and mission descriptions set by the game designers, offering a new level of interaction that is more engaging and immersive.
In this project, we created a demo game called Echoes of Eternity and trained an NPC, Captain Jaela, to have conversations with the gamer to demonstrate the feasibility of our application. The demo video is shown below, along with a thorough assessment by the team and three interviewees.
Approach
We're incorporating Gemini Pro as the LLM to act as the NPC and develop an engaging interaction system between the player and NPC. We'll supply the model with the NPC persona, game objectives, prompt instructions,  along with the player's prompt, to generate responses, aiming to create a rich and immersive dialogue experience. The prompt instructions consist of how the prompt should be structured, also where the zero, one, and few shot learning examples are provided. These methods are to test the LLM’s ability to provide more accurate and meaningful responses during interactions.
To enable player interact with the NPC, we built a Flask application. This app will offer two main ways for players to interact: they can type their questions or speak them out loud. When a player chooses to use voice, their spoken words will be transformed into text and then processed by our language model. Speech to text is done by the speech_recognition package in Python. If they opt for typing, there's a textbox for entering their queries.

Once a query is asked, whether through voice or text, is sent to the NPC. The model then generates a response, which is displayed on the screen for the player to read. Additionally, we're making the interaction more engaging by converting the model's text responses back into speech. This means players can not only see but also hear the NPC's replies, making for a more dynamic and immersive game experience. The player can continue the conversation by asking a follow-up question to the NPC.

Data and Code
The data used to provide context for the NPC can be found here.
The code used to build Narrative Nexus can be found here.
DEMO
Demo Video to explaining the project and a walkthrough of Narrative Nexus
Demo Video to show the audio playback of the NPC generated answer
Prompt Instruction
These are instructions that are sent to the NPC to make sure the responses are generated in a particular format. 
Prompt Instruction
Based on your background and game objectives above, think about your character's tone and way of expression and consider the knowledge and familiarities you will be aware of. You must use the character's tone and way of expression when talking to the player.  
Answer only what you might know based on the character's knowledge and familiar things.
 Make sure the answers are direct and brief. Only print the answer to the last prompt, do not print the prompt. Answer with no headings, no subheadings.

Zero Shot Learning
In this, we provide no examples in the prompt instruction. 
One Shot Learning
We provide one example at the end of the prompt instructions:
For example: 
Prompt: What is the weather right now?
Answer: As a spacefarer who traverses vast cosmic distances, I am not attuned to local weather conditions on specific planets or regions. My knowledge and expertise lie in navigating interstellar routes, deciphering ancient artifacts, and unraveling cosmic mysteries. I do not possess the ability to make educated guesses about current weather patterns.

Few Shot Learning
We provide two examples at the end of the prompt instruction.
For example: 
Prompt: What is the weather right now?
Answer: As a spacefarer who traverses vast cosmic distances, I am not attuned to local weather conditions on specific planets or regions. My knowledge and expertise lie in navigating interstellar routes, deciphering ancient artifacts, and unraveling cosmic mysteries. I do not possess the ability to make educated guesses about current weather patterns.

Prompt: Will you take care of me?
Answer: While I value the well-being of my crew and companions during our cosmic journeys, my primary focus lies in exploring the vast expanse of the universe and unraveling its mysteries. My responsibilities as a captain and explorer demand my attention and dedication. I cannot offer personal care or protection beyond the scope of our shared mission and objectives.

Types of Questions
For the Hint questions, we appended "Give the user ONLY ONE of four hints from the game objectives at random. Only output the one hint do not give any extra information." at the end of the prompt instruction.
Evaluation Rubrics
We evaluated the responses to each type of questions using qualitative and quantitative metrics. The figures below shows the various metrics used for this project
Results
Three examples of NPC generated answers for zero, one, and few shot learning
These results are averaged across three interviewees. Please refer to ​​​​​​​this link to see the entire evaluation metric table.
Summarized tabular results for each metric averaged across three interviewees
The plots above showcase the quantitative metrics averaged across all the types of questions. We can see that the NPC with few shot learning prompts generate the answers the slowest with an average time of above 6 seconds while for the NPC with zero shot learning, the average time to generate the answers is the fastest, being around 3 seconds. The accuracy of the answers generated (for specific and task based questions) is the highest and lowest for zero shot and one shot learning respectively, and notice an improvement from one shot to few shot. Interestingly, we see that zero and few shot learning always paraphrases the hints while one shot learning paraphrased only 66.66% of the time.
Qualitative Metrics are all scored on a 1-5 scale.
For the specific qualitative metrics in the bar chart above (for open-ended questions), the interviewees similarly compare the answers generated for the zero shot and few shot learning prompts, while not being satisfied with the one-shot learning answers. What the interviewees notices was that while the the one-shot prompt generated answers had reasonable necessary knowledge and alignment of the characters background, logically, it did not make much sense and started talking about random aspects that did not answer the question.
Discussion
Upon showcasing the demo to the three interviewees, several features were made noticeable. The first is the difference between prompt models. The zero-shot model surprised both the team and the interviewees with its ability to generate answers that are logically sound yet not pre-written into the system. We received reliable answers from the zero-shot model on open-ended questions, which is remarkable. We observed that zero-shot learning also performs comparably with few-shot learning across all qualitative metrics (alignment, logic, fluency, and knowledge) while having shorter and quicker responses (~45 words lesser and ~3 seconds faster).

Secondly, the overall performance of the zero-shot and few-shot models proved to be satisfactory compared to the one-shot prompt model. This was often evidenced in the completeness and logic of the answers we received using our assessment questions. The one-shot prompt model, however, often produced tediously long answers with bullet points that made no sense and didn’t truly address the questions we posed. We noticed a drop in performance for the one-shot learning across both quantitative and qualitative metrics. We think this might be the case because it attempts to mimic the style of the example given but does not focus on the actual content of the answer generated. Hence, it tries to generate answers similar to the example but tends to go off track as there is not enough direction for it to learn. The one-shot learning prompts generate answers with more words than zero and few-shot when asked open-ended questions. We imagine that with zero-shot learning, as there are no examples to guide it, it focuses on the context and attempts to answer briefly and directly. With few-shot learning, multiple examples can help guide the answer, so it performs better than one-shot. Another point worth mentioning is the subjectivity in the answers from the one-shot prompt model, making it far more unreliable.

Thirdly, we found that most hint questions were accurate paraphrases except for one example in the one-shot prompt model, which only contained half of the hint. This is also a positive aspect since paraphrasing without losing key information makes the NPC more vivid and helps the gamer better understand the hint for his/her mission.

For quick and effective conversations, we can see that zero shot learning might be the best way for a gamer to interact with the NPC. However, if the gamer needs a more creative and longer response, few-shot learning based NPCs could be recommended. We and the interviewees feel that the one-shot learning based NPCs should not be used to enhance gamer-NPC interaction.

Future Direction
The demo shows the feasibility of the idea and an attempt at discovering ways to enhance the interaction between a player and an NPC through the use of LLM. Our goal is to embed every NPC with its specific tone, background, level of knowledge, and unique tasks/hints that wait for the gamer to discover. In such an ideal game, the user will feel more like living in a science fiction world than playing video games, enabling them to further explore, understand, and dive into the world. For the scope of the project, the experiment was successful in discovering ways to improve interaction. We were able to explore various different types of questions and found how zero, one, and few-shot learning impacts NPC's generated responses.
 
A key feature we look forward to adding is asking the NPC to recognize pictures from the gamer and have a conversation based on the picture content. This functionality is expected to be achieved by incorporating Gemini Pro Vision and could further expand the possibilities of the game. For now, Gemini Pro Vision only allows answers based on the pictures given, and we still cannot have conversations based on the pictures. Therefore, our team looks forward to having the future version of Gemini Pro Vision and to realizing our concept based on that.

Another future improvement is focused on more intelligent AI models and responses. For now, the NPC gives hints randomly when prompted with "I need help". In the future, we want to make it assign a specific task to the gamer based on the conversation. If the gamer indicates his preference or asks questions about a specific figure/place, the NPC will understand the preference and give priority to the corresponding task and hint. What’s more, we also want an NPC to change its tone/style during the different stages of the game. An NPC could be timid and afraid when the environment is unfamiliar and villains occupy the city, while it can also be happy and talkative when the city is secure and safe.
Narrative Nexus
Published:

Narrative Nexus

Published: